News Archive

NPACI Transition of Supercomputer Center Researchers Sets New Data Transfer Record

Massive Data Arrives at SDSC, UC San Diego, over vBNS and on Tape from PSC and CTC

Published 04/02/1998

For further information, contact:
Ann Redelfs, NPACI/SDSC, 619-534-5032, redelfs@sdsc.edu

SAN DIEGO -- To allow users of NSF-funded supercomputer facilities to continue their computational research, SDSC, the leading-edge site for the National Partnership for Advanced Computational Infrastructure (the Partnership), has received 16 terabytes--16,000 gigabytes--of data in the past few weeks, including more than three terabytes of data over the vBNS, and more is coming soon.

As the National Science Foundation makes the transition from the Supercomputer Centers program to the Partnerships for Advanced Computational Infrastructure, the massive data transfers from the Pittsburgh Supercomputing Center (PSC) and the Cornell Theory Center (CTC) will allow researchers to continue their projects on NPACI's and NCSA's high-performance computers, including the CRAY T90, T3E, and IBM SP at SDSC.

More than three terabytes of data from PSC have been transported across the vBNS network over the past several weeks. PSC has approximately 40 terabytes of data in its archives, of which roughly one quarter is recent NSF data slated to be transferred to other sites. Approximately five terabytes remain to be transferred in the next few weeks.

"We had an enormous volume of data to move, and a limited time in which to do it," said Phil Andrews, SDSC systems manager. "Moving so large a volume of data from point to point across a wide-area network and across the country is not an everyday affair, but we have the ability to do it today." A joint paper on the file transfer process and its results will be submitted by PSC, NCSA, and SDSC for presentation at the SC98 conference in Orlando this November.

Until the advent of the vBNS, the only way to move terabytes of information was to transport hundreds or thousands of tapes to a new location. SDSC and PSC fully automated the transfer process over the vBNS. All of a user's files are read from PSC's archive in tape optimal order, packed into two-gigabyte cpio files, and delivered through the vBNS to the destination site. "A single user's data-hundreds of gigabytes in tens of thousands of files-can be moved through the vBNS in several hours," said Matt Mathis, network engineering specialist at PSC.

Setting up such a massive and sustained data transfer is not a trivial task. The two centers are connected on the vBNS by an OC-3 link at 155 megabits per second, and fast internal networks and I/O interfaces at PSC and SDSC comprise the rest of the chain. During the weekend of March 7, two four-megabyte per second streams transferred more than half a terabyte per day. The total for the weekend was more than 1.1 terabytes of file data.

The vBNS also accelerated the process of moving and installing the CTC High Performance Storage System (HPSS) archive at SDSC. "We received three gigabytes of HPSS metadata on March 19 over the vBNS and started creating the CTC West system on Friday afternoon," said Michael Gleicher, SDSC's HPSS expert. "The metadata import was completed by Monday morning, and I was able to start actively working on setting things up for our site. The vBNS transfer of metadata saved us several days."

The bulk of the CTC data arrived in two shipments totalling 5,000 tapes, which moved the entire 13-terabyte archive from Ithaca to San Diego. CTC staff packed two copies of their archive--about 2,500 tapes each--and shipped each on a separate day to ensure the safe arrival at SDSC of at least one copy. The shipments arrived March 20 and March 21, respectively, and the one ton of tapes acclimated to the SDSC machine room over the weekend. Users have been able to access the CTC data since March 30.

Part of the challenge in moving the CTC data is that the current release of HPSS does not have built-in export capabilities. Working with CTC staff Ruth Mitchell, Phil Pishioneri, and IBM contractor at Cornell Jeff Deutsch, SDSC and CTC planned, tested, and implemented the move of the entire archive. Prior to this move, CTC installed a new system (IBM's ADSM and HSM) and copied over most local and corporate data. Once the system at SDSC is up and running, the copying of local data will continue, and national users running at CTC will continue to have access to the data through HPSS utilities.

Thanks to the efforts of Gleicher and others, the data has already been made available via the new archive. The CTC West HPSS archive, independent of SDSC's production archive, uses a StorageTek silo retrofitted with 3590 tape drives to hold the Cornell tapes. About half the tapes are 3590 tapes; the other half are older 3480 tapes. The data on the 1,200 3480 tapes will be transferred to 3590 tapes over the next several weeks. (One 3590 tape holds 10 gigabytes of data, more with data compression; one 3480 tape holds approximately 200 megabytes.)

SDSC will run the CTC archive as a separate, read-only HPSS system. CTC users will be able to request data from the archive but will write data to SDSC's production archive. In addition, SDSC will also be running a background job that will move any remaining data to the production system.

"The goal is to reclaim the StorageTek silo for the SDSC system within a year," Andrews said. The StorageTek silo holds 6,000 tapes and so will form an important addition to SDSC's two IBM 3494 tape libraries, which hold 2,000 tapes each. The trio of tape libraries will play a pivotal role as the NPACI computing and data-handling infrastructure scales to a petabyte data archive in the coming years.

SDSC, at the University of California, San Diego, is sponsored by the National Science Foundation through the National Partnership for Advanced Computational Infrastructure and by other federal agencies, the State and University of California and private organizations. For additional information about the Partnership and SDSC, see: http://www.npaci.edu/ and http://www.sdsc.edu/.